Source file ⇒ hope_final.Rmd
water_residential <- "/Users/jann/stat133-spring2016/uw_supplier_data040416.csv"
new_water_residential <- water_residential %>%
read.file() %>%
select(Supplier.Name, Stage.Invoked, Mandatory.Restrictions,
Reporting.Month, CALCULATED.R.GPCD.Reporting.Month..Values.calculated.by.Water.Board.staff.using.methodology.available.at.http...www.waterboards.ca.gov.waterrights.water_issues.programs.drought.docs.ws_tools.guidance_estimate_res_gpcd.pdf., Hydrologic.Region, Penalties.Assessed, X..Residential.Use)
## Reading data with read.csv()
names(new_water_residential)[names(new_water_residential) == 'CALCULATED.R.GPCD.Reporting.Month..Values.calculated.by.Water.Board.staff.using.methodology.available.at.http...www.waterboards.ca.gov.waterrights.water_issues.programs.drought.docs.ws_tools.guidance_estimate_res_gpcd.pdf.'] <- 'resid_use'
new_water_residential$Reporting.Month <- new_water_residential$Reporting.Month%>%
mdy()
#los angeles water suppliers
#want all of los angeles county so any water supplier that matches the cities within LA county
#some gsub to get this nice list of the cities in LA county
lacities <- "Agoura Hills Alhambra Arcadia Artesia Avalon Azusa Baldwin Park Bell Bell Gardens Bellflower Beverly Hills Bradbury Burbank Calabasas Carson Cerritos Claremont Commerce Compton Covina Cudahy Culver City Diamond Bar Downey Duarte El Monte El Segundo Gardena Glendale Glendora Hawaiian Gardens Hawthorne Hermosa Beach Hidden Hills Huntington Park Industry Inglewood Irwindale La Cañada Flintridge La Habra Heights La Mirada La Puente La Verne Lakewood Lancaster Lawndale Lomita Long Beach Los Angeles Lynwood Malibu Manhattan Beach Maywood Monrovia Montebello Monterey Park Norwalk Palmdale Palos Verdes Estates Paramount Pasadena Pico Rivera Pomona Rancho Palos Verdes Redondo Beach Rolling Hills Rolling Hills Estates Rosemead San Dimas San Fernando San Gabriel San Marino Santa Clarita Santa Fe Springs Santa Monica Sierra Madre Signal Hill South El Monte South Gate South Pasadena Temple City Torrance Vernon Walnut West Covina West Hollywood Westlake Village Whittier"
lacities <- gsub(" " , ")|(" , lacities)
lacities
## [1] "Agoura)|(Hills)|(Alhambra)|(Arcadia)|(Artesia)|(Avalon)|(Azusa)|(Baldwin)|(Park)|(Bell)|(Bell)|(Gardens)|(Bellflower)|(Beverly)|(Hills)|(Bradbury)|(Burbank)|(Calabasas)|(Carson)|(Cerritos)|(Claremont)|(Commerce)|(Compton)|(Covina)|(Cudahy)|(Culver)|(City)|(Diamond)|(Bar)|(Downey)|(Duarte)|(El)|(Monte)|(El)|(Segundo)|(Gardena)|(Glendale)|(Glendora)|(Hawaiian)|(Gardens)|(Hawthorne)|(Hermosa)|(Beach)|(Hidden)|(Hills)|(Huntington)|(Park)|(Industry)|(Inglewood)|(Irwindale)|(La)|(Cañada)|(Flintridge)|(La)|(Habra)|(Heights)|(La)|(Mirada)|(La)|(Puente)|(La)|(Verne)|(Lakewood)|(Lancaster)|(Lawndale)|(Lomita)|(Long)|(Beach)|(Los)|(Angeles)|(Lynwood)|(Malibu)|(Manhattan)|(Beach)|(Maywood)|(Monrovia)|(Montebello)|(Monterey)|(Park)|(Norwalk)|(Palmdale)|(Palos)|(Verdes)|(Estates)|(Paramount)|(Pasadena)|(Pico)|(Rivera)|(Pomona)|(Rancho)|(Palos)|(Verdes)|(Redondo)|(Beach)|(Rolling)|(Hills)|(Rolling)|(Hills)|(Estates)|(Rosemead)|(San)|(Dimas)|(San)|(Fernando)|(San)|(Gabriel)|(San)|(Marino)|(Santa)|(Clarita)|(Santa)|(Fe)|(Springs)|(Santa)|(Monica)|(Sierra)|(Madre)|(Signal)|(Hill)|(South)|(El)|(Monte)|(South)|(Gate)|(South)|(Pasadena)|(Temple)|(City)|(Torrance)|(Vernon)|(Walnut)|(West)|(Covina)|(West)|(Hollywood)|(Westlake)|(Village)|(Whittier"
#NOT SHOWN:then manually put a . in between cities with more than one word in the name to account for spaces
#below is the dataset that only has los angeles county levels of water usage
losangeles.avg <- new_water_residential%>%
filter(grepl("(Agoura.Hills)|(Alhambra)|(Arcadia)|(Artesia)|(Avalon)|(Azusa)|(Baldwin.Park)|(Bell)|(Bell.Gardens)|(Bellflower)|(Beverly.Hills)|(Bradbury)|(Burbank)|(Calabasas)|(Carson)|(Cerritos)|(Claremont)|(Commerce)|(Compton)|(Covina)|(Cudahy)|(Culver.City)|(Diamond.Bar)|(Downey)|(Duarte)|(El.Monte)|(El.Segundo)|(Gardena)|(Glendale)|(Glendora)|(Hawaiian)|(Gardens)|(Hawthorne)|(Hermosa.Beach)|(Hidden.Hills)|(Huntington.Park)|(Industry)|(Inglewood)|(Irwindale)|(La.Cañada.Flintridge)|(La.Habra.Heights)|(La.Mirada)|(La.Puente)|(La.Verne)|(Lakewood)|(Lancaster)|(Lawndale)|(Lomita)|(Long.Beach)|(Los.Angeles)|(Lynwood)|(Malibu)|(Manhattan.Beach)|(Maywood)|(Monrovia)|(Montebello)|(Monterey.Park)|(Norwalk)|(Palmdale)|(Palos.Verdes.Estates)|(Paramount)|(Pasadena)|(Pico.Rivera)|(Pomona)|(Rancho)|(Palos.Verdes)|(Redondo.Beach)|(Rolling.Hills)|(Rolling.Hills.Estates)|(Rosemead)|(San.Dimas)|(San.Fernando)|(San.Gabriel)|(San.Marino)|(Santa.Clarita)|(Santa.Fe.Springs)|(Santa.Monica)|(Sierra.Madre)|(Signal.Hill)|(South.El.Monte)|(South.Gate)|(South.Pasadena)|(Temple.City)|(Torrance)|(Vernon)|(Walnut)|(West.Covina)|(West.Hollywood)|(Westlake)|(Village)|(Whittier)", Supplier.Name))%>%
group_by(Reporting.Month)%>%
summarise(la_use_avg = mean(resid_use))
#there are multiple counties surrounding the bay: SF, Marin, Sonoma, Napa, Solano, Contra Costa, Alameda, Santa Clara, and San Mateo. Eastbay
sfbay <- "Alameda, California
+ Albany, California
+ American Canyon, California
+ Antioch, California
+ Atherton, California
+ B
+ Belmont, California
+ Belvedere, California
+ Benicia, California
+ Berkeley, California
+ Brentwood, California
+ Brisbane, California
+ Burlingame, California
+ C
+ Calistoga, California
+ Campbell, California
+ Clayton, California
+ Cloverdale, California
+ Colma, California
+ Concord, California
+ Corte Madera, California
+ Cotati, California
+ Cupertino, California
+ D
+ Daly City, California
+ Danville, California
+ Dixon, California
+ Dublin, California
+ E
+ East Palo Alto, California
+ El Cerrito, California
+ Emeryville, California
+ F
+ Fairfax, California
+ Foster City, California
+ Fremont, California
+ G
+ Gilroy, California
+ H
+ Half Moon Bay, California
+ Hayward, California
+ Healdsburg, California
+ Hercules, California
+ Hillsborough, California
+ L
+ Lafayette, California
+ Larkspur, California
+ Livermore, California
+ Los Altos, California
+ Los Altos Hills, California
+ Los Gatos, California
+ M
+ Martinez, California
+ Menlo Park, California
+ Mill Valley, California
+ Millbrae, California
+ Milpitas, California
+ Monte Sereno, California
+ Moraga, California
+ Morgan Hill, California
+ Mountain View, California
+ N
+ Napa, California
+ Newark, California
+ Novato, California
+ O
+ Oakland, California
+ Oakley, California
+ Orinda, California
+ P
+ Pacifica, California
+ Palo Alto, California
+ Petaluma, California
+ Piedmont, California
+ Pinole, California
+ Pittsburg, California
+ Pleasant Hill, California
+ Pleasanton, California
+ Portola Valley, California
+ R
+ Redwood City, California
+ Richmond, California
+ Rio Vista, California
+ Rohnert Park, California
+ Ross, California
+ S
+ St. Helena, California
+ San Anselmo, California
+ San Carlos, California
+ San Francisco
+ San Jose, California
+ San Leandro, California
+ San Mateo, California
+ San Pablo, California
+ San Rafael, California
+ San Ramon, California
+ Santa Clara, California
+ Santa Rosa, California
+ Saratoga, California
+ Sausalito, California
+ Sebastopol, California
+ Sonoma, California
+ South San Francisco, California
+ Suisun City, California
+ Sunnyvale, California
+ T
+ Tiburon, California
+ U
+ Union City, California
+ V
+ Vacaville, California
+ Vallejo, California
+ W
+ Walnut Creek, California
+ Windsor, California
+ Woodside, California
+ Y
+ Yountville, California"
sfbay <- gsub("California", "", sfbay) #get rid of california
sfbay <- gsub("\n[A-Z]\n", "", sfbay) #get rid of the headers for each section of cities beginning with a certain letter
sfbay <- gsub("\n", "", sfbay) #getting rid of extra newlines
sfbay <- gsub(", " , ")|(", sfbay) #inputting the separations for when I use grepl later
sfbay #want to look at it so I can copy/paste and make small edits for when I use it in grepl
## [1] "Alameda)|(+ Albany)|(+ American Canyon)|(+ Antioch)|(+ Atherton)|(+ B+ Belmont)|(+ Belvedere)|(+ Benicia)|(+ Berkeley)|(+ Brentwood)|(+ Brisbane)|(+ Burlingame)|(+ C+ Calistoga)|(+ Campbell)|(+ Clayton)|(+ Cloverdale)|(+ Colma)|(+ Concord)|(+ Corte Madera)|(+ Cotati)|(+ Cupertino)|(+ D+ Daly City)|(+ Danville)|(+ Dixon)|(+ Dublin)|(+ E+ East Palo Alto)|(+ El Cerrito)|(+ Emeryville)|(+ F+ Fairfax)|(+ Foster City)|(+ Fremont)|(+ G+ Gilroy)|(+ H+ Half Moon Bay)|(+ Hayward)|(+ Healdsburg)|(+ Hercules)|(+ Hillsborough)|(+ L+ Lafayette)|(+ Larkspur)|(+ Livermore)|(+ Los Altos)|(+ Los Altos Hills)|(+ Los Gatos)|(+ M+ Martinez)|(+ Menlo Park)|(+ Mill Valley)|(+ Millbrae)|(+ Milpitas)|(+ Monte Sereno)|(+ Moraga)|(+ Morgan Hill)|(+ Mountain View)|(+ N+ Napa)|(+ Newark)|(+ Novato)|(+ O+ Oakland)|(+ Oakley)|(+ Orinda)|(+ P+ Pacifica)|(+ Palo Alto)|(+ Petaluma)|(+ Piedmont)|(+ Pinole)|(+ Pittsburg)|(+ Pleasant Hill)|(+ Pleasanton)|(+ Portola Valley)|(+ R+ Redwood City)|(+ Richmond)|(+ Rio Vista)|(+ Rohnert Park)|(+ Ross)|(+ S+ St. Helena)|(+ San Anselmo)|(+ San Carlos)|(+ San Francisco+ San Jose)|(+ San Leandro)|(+ San Mateo)|(+ San Pablo)|(+ San Rafael)|(+ San Ramon)|(+ Santa Clara)|(+ Santa Rosa)|(+ Saratoga)|(+ Sausalito)|(+ Sebastopol)|(+ Sonoma)|(+ South San Francisco)|(+ Suisun City)|(+ Sunnyvale)|(+ T+ Tiburon)|(+ U+ Union City)|(+ V+ Vacaville)|(+ Vallejo)|(+ W+ Walnut Creek)|(+ Windsor)|(+ Woodside)|(+ Y+ Yountville)|("
bayavg <- new_water_residential%>%
filter(grepl( "(Alameda)|(Albany)|(American.Canyon)|(Antioch)|(Atherton)|(Belmont)|(Belvedere)|(Benicia)|(Berkeley)|(Brentwood)|(Brisbane)|(Burlingame)|(Calistoga)|(Campbell)|(Clayton)|(Cloverdale)|(Colma)|(Concord)|(Corte.Madera)|(Cotati)|(Cupertino)|(Daly.City)|(Danville)|(Dixon)|(Dublin)|(East.Palo.Alto)|(El.Cerrito)|(Emeryville)|(Fairfax)|(Foster.City)|(Fremont)|(Gilroy)|(Half.Moon.Bay)|(Hayward)|(Healdsburg)|(Hercules)|(Hillsborough)|(Lafayette)|(Larkspur)|(Livermore)|(Los.Altos)|(Los.Altos.Hills)|(Los.Gatos)|(Martinez)|(Menlo.Park)|(Mill.Valley)|(Millbrae)|(Milpitas)|(Monte.Sereno)|(Moraga)|(Morgan.Hill)|(Mountain.View)|(Napa)|(Newark)|(Novato)|(Oakland)|(Oakley)|(Orinda)|(Pacifica)|(Palo.Alto)|(Petaluma)|(Piedmont)|(Pinole)|(Pittsburg)|(Pleasant.Hill)|(Pleasanton)|(Portola.Valley)|(Redwood.City)|(Richmond)|(Rio.Vista)|(Rohnert.Park)|(Ross)|(St..Helena)|(San.Anselmo)|(San.Carlos)|(San.FranciscoSan.Jose)|(San.Leandro)|(San.Mateo)|(San.Pablo)|(San.Rafael)|(San.Ramon)|(Santa.Clara)|(Santa.Rosa)|(Saratoga)|(Sausalito)|(Sebastopol)|(Sonoma)|(South.San.Francisco)|(Suisun.City)|(Sunnyvale)|(Tiburon)|(Union.City)|(Vacaville)|(Vallejo)|(Walnut.Creek)|(Windsor)|(Woodside)|(Yountville)|(East.Bay)", Supplier.Name ))%>%
group_by(Reporting.Month)%>%
summarise(bay_use_avg = mean(resid_use))
# we want to compare to california as a whole so let's take the average for every month for the whole state!
state.avg <- new_water_residential%>%
group_by(Reporting.Month)%>%
summarise(state_use_avg = mean(resid_use))
Now let’s combine all three of these datasets so that we have just one dataset we’re plotting from
la_state <- state.avg%>%
inner_join(losangeles.avg, by = c("Reporting.Month" = "Reporting.Month"))
alljoined <- la_state%>%
inner_join(bayavg, by = c("Reporting.Month" = "Reporting.Month"))
#now that I have everything joined, I have to make it tidy
#using gather to make it narrow
narrowall <- alljoined%>%
gather(key = boundary, value = avg_resid_use, state_use_avg, la_use_avg, bay_use_avg)
narrowall$boundary <- gsub("state_use_avg", "California", narrowall$boundary)
narrowall$boundary <- gsub("la_use_avg", "Los Angeles County", narrowall$boundary)
narrowall$boundary <- gsub("bay_use_avg", "All 9 Bay Area Counties", narrowall$boundary)
Time to plot!
ggplot(narrowall, aes(x=Reporting.Month, y= avg_resid_use, color = boundary)) +
geom_point() +
stat_smooth(se=FALSE, method="loess") +
labs(x = "Month", y = "Average Residential Water Usage", title = "Average Residential Water Usage vs Month") +
theme(legend.key = element_rect(colour = "black"),
plot.background = element_rect(colour = "grey"),
panel.background = element_rect(fill = "grey"),
panel.background = element_rect(color = "black"),
panel.grid.minor = element_line(linetype = "dotted"),
axis.title = element_text(size = rel(1.5)),
axis.text = element_text(size = rel(1.0)),
legend.text = element_text(size = rel(1.0)),
plot.title = element_text(size = rel(2))
) +
scale_colour_manual("Region", values = c("green", "blue","yellow"))
initial analysis: Most important thing to note is that the average usage for Bay Area counties are consistently lower than LA county usage. Overall California trend, LA, and Bay area all follow similar seasonal changes in usage.
#install.packages("mapdata")
#install.packages("ggmap")
library(mapdata)
## Loading required package: maps
##
## # maps v3.1: updated 'world': all lakes moved to separate new #
## # 'lakes' database. Type '?world' or 'news(package="maps")'. #
library(ggmap)
#data set that has drought by county
drought_severity <- "/Users/jann/stat133-spring2016/countydroughtseverity.csv"
drought_severity <- drought_severity%>%
read.file()
## Reading data with read.csv()
drought_severity$county <- gsub(" County", "", drought_severity$county)
drought_severity$county <- sapply(drought_severity$county, tolower)
names(drought_severity)[names(drought_severity) == "county"] <- "subregion"
drought_severity <- data.frame(drought_severity)
#adding month and year column for later
drought_severity$releaseDate <- as.Date(drought_severity$releaseDate)
drought_severity$month <- months(drought_severity$releaseDate)
drought_severity$year <- year(ymd(drought_severity$releaseDate))
#narrow version of drought_severity
#taking the average for each month for each category of severity for simplicity
tidy_severity <- drought_severity%>%
gather(key = category, value = value, NONE, D0, D1, D2, D3, D4)%>%
select(subregion, month, year, category, value, FIPS)%>%
group_by(month, year, subregion, category)%>%
summarise(ave_value = mean(value))
#just making sure everything is in the correct format for ease of plotting
tidy_severity <- data.frame(tidy_severity)
tidy_severity$year <- as.numeric(tidy_severity$year)
#coordinates for the california counties
CAcounties <- map_data('county')%>%
filter(region == "california")
CAcounties <- data.frame(CAcounties)
#merge drought information with coordinates for each county
CAcountiesvalues <- CAcounties%>%
right_join(tidy_severity, by = "subregion")
#creating levels for mapping later
CAcountiesvalues$category <- factor(CAcountiesvalues$category, levels = c("D4", "D3", "D2", "D1", "D0", "NONE"))
CAcountiesvalues$month <- factor(CAcountiesvalues$month, levels = c("January", "February", "March", "April", "May","June", "July", "August","September", "October", "November","December"))
#okay so this makes the really large grid with many maps, may just use this as an overview, then take pieces from it?
CAcountiesvalues%>%
ggplot() +
geom_polygon(aes (x = long, y = lat, group = group, fill = ave_value), colour = "white", size = 0.02)+ scale_fill_gradient(low = "yellow", high = "red") +
facet_grid(category ~ year) +
labs(x = "Year", y = "Drought Severity", title = "Drought Severity vs Year") +
theme(axis.line=element_blank(),axis.text.x=element_blank(),axis.text.y=element_blank(),axis.ticks=element_blank())
So the first “overview” maps may be too much to put into the presentation, because the image needs to be very large to show anything useful. I definitely think the contrast is good to see, but maybe the “Previous Drought” and the “Current Drought” maps will be enough to show how much worse off we are currently. Again, I think this in conjunction with the plot that Tiff made will be good together
Comments
Well this one doesn’t necessarily need to be used since we have better maps below, but if you did want to include this I would mention how in the two previous droughts in California, no counties reached “Exceptional Drought” levels, in contrast to 2014-2016 when multiple counties are registering as having “Exceptional Drought” levels. This in conjunction with Tiffany’s plot of # of counties vs drought level would be interesting side by side, because in these maps you can see that for 2014 and 2015 there were no counties that had “NONE” or no level of drought, and in 2016 we see some counties highlighted again. So Tiffany’s plot should tell us what number of counties there are that have “NONE” while this one shows where.
selected data
data manipulation
weighted average
plot manipulation
create levels for all months/years
another map of each month since 2000 to april 2014
map from 2006-2010 to capture the previous drought in California
map from 2011 to Present to show current drought: